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Abstract. We describe a highly flexible framework to 
solve 3D radiation transfer problems in scattering dom- 
inated environments based on a long characteristics piece- 
wise parabolic formal solution and an operator splitting 
method. We find that the linear systems are efficiently 
solved with iterative solvers such as Gauss-Seidel and Jor- 
dan techniques. We use a sphere-in- a-box test model to 
compare the 3D results to ID solutions in order to as- 
sess the accuracy of the method. We have implemented 
the method for static media, however, it can be used to 
solve problems in the Eulerian-frame for media with low 
velocity fields. 



1. Introduction 

With the increase in computer power in the last few 
years, 3D hydrodynamical calculations are becoming in- 
creasingly common in astrophysics. Most hydrodynamical 
calculations treat radiation in a simplified manner, since a 
full solution of the 3-D non-LTE radiative transfer prob- 
lem is numerically much more expensive than the hydro- 
dynamical calculation itself, which already stretch the lim- 
its of modern parallel computers. In many instances, such 
as the thermonuclear explosion of a white dwarf (thought 
to be the progenitor of Type la supernova), the goal of the 
hydrodynamical simulations is to understand the mode of 
combustion and to handle the effects of turbulence in as 
realistic manner as possible and radiative transfer effects 
are ignored. However, in general since the ultimate valida- 
tion or falsification of the results of sophisticated hydro- 
dynamical modeling will be via comparison with the ob- 
served radiation from the astrophysical object being stud- 
ied, and the radiation strongly affects the physical state of 
the matter in the atmosphere of the object (where the ob- 
served radiation originates) the effect of detailed radiative 
transfer effects cannot be ignored. 

In many multi-di mensional hyd rodynamics codes (for 
example ZEUS3D. iNormanl I200 Q*) . radiative transfer is 
treated in a simplified matter in order to determine the 
amount of energy transfered between the matter and 



radiat ion (the cooling function) although iDvkema et alJ 
l|l99rf) presented a full time-dependent 2-D NLTE radia- 
tive transfer code, based on a variable Eddington factor 
method and equivalent two l e vel at om formulation. 

Recently iRiikhorst et alJ l|2005f) presented a method 
of including 3-D radiative transfer into modern 3-D hy- 
drodynamical codes (such as the ASCI FLASH code) but 
they only treated the solution of the radiative transfer 
equation in the absence of scattering, that is they their 
method only treats the formal solution of the radiative 
transfer equation and not the full self consistent scatter- 
ing problem where the right hand side of the radiative 
transfer problem involves the radiation field itself. How- 
ever, in astrophysical systems, the effect of scattering can- 
not be ignored and in fact it is due to scattering that the 
radiation field decouples from the local emission and ab- 
sorption and the effects of the existence of the b oundary 
are c ommunicated globally over the atmosphere l|Mihalasl 
Il978j) . It is just this strong non- locality of the radiation 
field that makes the solution of the generalized radiative 
tran sfer pr obl em so computationally demanding. 

ISteineil l|l99lh presented a 2-D multi-grid 
method based on the short-characteri stics method 
l|01son fc Kunasd Il987t lOlson et all Il987|) and showed 
that it worked in the case of a purely absorptive at- 
mosphere, i.e., that the formal solution was tractable. 



I Vat hi ( 1994^1 presented a 3-D short characteristics method 
and showed that it had adequate scaling on a SIMD 
parallel architecture. In a series of papers 3-D, short 
ch aracteri s tics m e thods for disk syst e ms were presented 
bv lAdaml lll99f|: iHummell (|l994albl) : iHummel fc Dachsl 
(1992); Panka,lhI <|l99fllL Our method is similar in spirit to 
these works, but we present a more detailed description 
of the construction of the approximate lambda operator 
(ALO), and the method of solution of the scattering 
problem. 

iFabiani Bendicho et alJ l|l997j) presented a multi-level, 
multi-grid, multi-dimensional radiative transfer scheme, 
using a lower triangular ALO and solving the scattering 
problem via a Gauss-Seidel method. 
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Ivan Noort. Hubenv. fc Lanzl l|2002j) presented a 
method of solving the full NLTE radiative transfer 
problem using the short characteristics method in 2-D 
for Cartesian, spherical, and cylindrical geometry. They 
also u sed the technique of accelerated lambda iteration 
(ALI) ((Olson et al.lll987tl01son & Kunaszlll987h . however 
they restricted themselves to the case of a diagonal ALO. 
In addition they considered the case including velocity 
fields, but their method is feasible only in the case of a 
small velocity gradient across the atmosphere. 

We describe below a simple framework to solve three 
dimensional (3D) radiation transport (RT) problems with 
a non-local operator splitting method. In subsequent pa- 
pers we will extend this framework to solve 3D RT prob- 
lems in relativistically moving configurations. Our method 
is similar to those described above, although we con- 
sider both short characteristics and long characteristics 
methods for the formal solution of the radiative trans- 
fer equation. We show that long-characteristics produce 
a significantly better numerical solution in our test cases 
for a strongly scattering dominated atmosphere. Short- 
characteristics are known to be diffusive and it is appar- 
ent in our numerical results. We also implement a partial 
parallelization of the method, although we defer a full dis- 
cussion of the parallelization to future work. 



2. Method 



In the following discussion we use notation of lHauschildtl 
First, we will describe the process for the formal 
solution, then we will describe how we construct the non- 
local approximate A operator, A*, and methods to solve 
the necessary linear equations in the operator splitting 
step. 

2.1. Framework 

The 3D RT equations ar e easiest solved in a Cartesia n 
coordinate system (e.g., iFabiani Bendicho et al.lll99^ . 
therefore we use a Cartesian grid of volume cells (vox- 
els). Basically, the voxels are allowed to have different 
sizes but for the tests presented later in this paper we 
use fixed size voxels (as the simplest option). The values 
of physical quantities, such as temperatures, opacities and 
mean intensities, are averages over a voxel, which, there- 
fore, also fixes the local physical resolution of the grid. 
In the following we will specify the size of the voxel grid 
by the number of voxels along each positive axis, e.g., 
= 32 specifies a voxel grid from voxel 
coordinates (-32, -32, -32) to (32, 32, 32) for a total of 
(2*32 + 1) 3 = 274625 voxels, 65 along each axis. The voxel 
(0, 0, 0) is at the center of the voxel grid. The voxel centers 
are the grid points. The voxel coordinates are related by 
grid scaling factors to physical space, depending on the 
problem. The framework does not require n x = n y — n z , 



we use this for the tests presented in this paper for con- 
venience. 

The applications that wc intend to solve with the 
3DRT framework will involve optically thick environments 
with a significant scattering contribution, e.g., modeling 
the light reflected by an extrasolar giant planet close to 
its parent star. Therefore, not only a formal solution is 
required but the full solution of the 3D radiative trans- 
fer equation with scattering. In this paper, we describe a 
method based on the operator splitting approach. Opera- 
tor splitting works best if a non-local A* opera tor is used 
in the calculations (e.g.. lHauschildt et alJlf994|) . therefore 
we describe a non-local A* method here. The operator 
splitting method can be combined with other methods, 
like multigrids, to allow for greater flexibility and better 
convergence, which we will discuss in a later paper. 

2.2. Radiative transfer equation 

The static radiative transfer equation in 3-D may be writ- 
ten 

n • x, n) = r)(y, x) — x{v, x)I{v, x, n) (I) 
where I{v, x, n) is the specific intensity at frequency v, 
position x, in the direction n, rj(y,x) is the emissivity at 
frequency v and position x, and x{ v t x ) is the total ex- 
tinction at frequency v and position x. The source func- 
tion S = rj/x- Here, we will work in the steady-state 
so that dl/dt = 0, and in Cartesian coordinates so the 
V = + + jjj and the direction n is defined by two 
angles (6, (f>) at the position x. 

2.3. The operator splitting method 

The mean intensity J is obtained from the source function 
S by a formal solution of the RTE which is symbolically 
written using the A-operator A as 

J = AS. (2) 
The source function is given by S = (1 — e)J + eB, where e 
denotes the thermal coupling parameter and B is Planck's 
function. 

The A-iteration method, i.e. to solve Eq.Elby a fixed- 
point iteration scheme of the form 

Jncw — ASoldj "Snow — (1 — e)Jncw ~\~ , (3) 

fails in the case of large optical depths and small e. Here, 
S'oid is the current estimate for the source function S and 
S'now is new, improved, extimate of S for the next iteration. 
The failure to converge of the A-iteration is caused by the 
fact that the large st eigenvalue of the a mplification matrix 
is approximately (Mihalas et 

max ~ (l-e)(l- 

T ), where T is the optical thickness of the medium. 
For small e and large T, this is very close to unity and, 
therefore, the convergence rate of the A-iteration is very 
poor. A physic al description of this effect can be found in 
lMihalasll|l98(t . 

The idea of the ALI or operator splitting (OS) method 
is to reduce the eigenvalues of the amplification matrix in 
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the iteration scheme l|Cauuonlll973l) by introducing an ap- 
proximate A-operator (ALO) A* and to split A according 
to 

A = A* + (A - A*) (4) 
and rewrite Eq. |2l as 

Jnew = A*5new + (A - A*) Sold- (5) 

This relation can be written as iHamannl 1 19871) 
[1 - A*(l - e )] J now = J fs - A*(l - e) J old , (6) 
where Jf s = ASW and J Q id is the current estimate of 
the mean intensity J. Equation is solved to get the new 
values of J which is then used to compute the new source 
function for the next iteration cycle. 

Mathematically, the OS method belongs to the same 
family of iterat ive methods as the Jacob i or the Gauss- 
Seidel methods ijGolub fe Van Loanll989|) . These methods 
have the general form 

Mx k+1 = Nx k + b (7) 
for the iterative solution of a linear system Ax = b where 
the system matrix A is split according to A = M — N . In 
the case of the OS method we have M = 1 — A*(l — e) 
and, accordingly, N = (A — A*)(l — e) for the system ma- 
trix A = 1 — A(l — e). The convergence of the iterations 
depends on the spectral radius, p(G), of the iteration ma- 
trix G = AI~ 1 N. For convergence the condition p(G) < 1 
must be fulfilled, this puts a restriction on the choice of A*. 
In general, the iterations will converge faster for a smaller 
spectral radius. To achieve a significant improvement com- 
pared to the A-iteration, the operator A* is constructed 
so that the eigenvalues of the iteration matrix G are much 
smaller than unity, resulting in swift convergence. Using 
parts of the exact A matrix (e.g., its diagonal or a tri- 
diagonal form) will optimally reduce the eigenvalues of 
the G. The calculation and the structure of A* should be 
simple in order to make the construction of the linear sys- 
tem in Eq. E|fast. For example, the choice A* = A is best 
in view of the convergence rate (it is equivalent to a direct 
solution by matrix inversion) but the explicit construction 
of A is more time consuming than the construction of a 
simpler A* . The solution of the system Eq. [5] in terms of 
linear algebra, u sing modern linear alg ebra packages such 
as, e.g., LAPACK ijAnderson et al.lll992IL is so fast that its 
CPU time can be neglected for the small number of vari- 
ables encountered in ID problems (typically the number 
of discrete shells is about 50). However, for 2D or 3D prob- 
lems the size of A gets very large due to the much larger 
number of grid points as compared to the ID case. Ma- 
trix inversions, which are necessary to solve Eq.EI directly, 
therefore become extremely time consuming. This makes 
the direct solution of Eq. H3 more CPU intensive even for 
A*'s of moderate bandwidth, except for the trivial case of 
a diagonal A*. Dif ferent metho ds like modified conjugate 
gradient methods llTurekl F9931 may be effective for these 
2D or 3D problems. 

The CPU time required for the solution of the RTE 
using the OS method depends on several factors: (a) the 
time required for a formal solution and the computation 



of Jf s , (b) the time needed to construct A*, (c) the time 
required for the solution of Eq. and (d) the number of 
iterations required for convergence to the prescribed accu- 
racy. Points (a), (b) and (c) depend mostly on the number 
of spatial points, and can be assumed to be fixed for any 
given configuration. However, the number of iterations re- 
quired to convergence depends strongly on the bandwidth 
of A*. This indicates, that there is an optimum bandwidth 
of the A*-operator which will result in the shortest pos- 
sible CPU time needed for the solution of the RTE, see 
lHa,uschildt et all fjflfli. 



2.4- Formal solution 

The formal solution through the voxel grid can be per- 
formed by a variety of methods. So f ar, we have imple- 
mented both a short-characteristic (SC. IOlson et alJl987l) 
and a long-characteristic (LC) method. Long and short 
characteristics are shown schematically in Fig. ^ I n our 
current implementation, the long characteristics are fol- 
lowed continuously through the voxel grid, the short char- 
acteristics start at the center of a voxel and step closest to 
the center of the next voxel. The distances along a (short 
or long) characteristic are then used to compute the op- 
tical depth steps. Along a characteristic (either short or 
long), the formal solution is computed using a piece- wise 
parabolic (PPM) or piece- wise linear (P LM) interpolation 
and i n tegrat i on of the source function ((Olson fc Kunasd 
Il987h . lAuerl l|2003h discusses the effect that high order 
interpolation may cause problems, therefore, we automat- 
ically use piece-wise linear interpolation if the change in 
the source function along the 3 points of the PPM step 
would be larger than a prescribed threshold (typically fac- 
tors of 100) or if the optical depth along the characteristic 
is very small (typically less than 10~ 3 ). Depending on the 
direction (6, <fi) of the characteristic, the formal solution 
proceeds through the voxel grid. 

Therefore, along a characteristic [which is in the static 
case just a straight line with given [9, </>)] the transport 
equation is simply 

^=I-S (8) 

With this definition, the formal solution of the RTE along 
the characteristics can be written in the following way (cf. 
lOlson fc Kunasd Il987l for a derivation of the formulae) 
where we have suppressed the index labeling the charac- 
teristic; Ti denotes the optical depth along the character- 
istic with T\ = and Ti_i < 7$ while r is calculated using 
piecewise linear interpolation of \ along the characteristic, 



i{n) = i"(Ti_i)exp(Ti_i - n) 

+ / S(t) exp(r — Ti) dr 

JTi-l 

I(Ti) = exp(-Ar i _ 1 ) + Mi 



(9) 
(10) 
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i labels the points along a characteristic and At; is cal- 
culated using piecewise linear interpolation of x along the 
characteristic 

At,_! = (xi-i + Xi)\si-i ~ ai|/2 (11) 
The starting points of each characteristic are at the center 
of the voxels on their starting faces of the voxel grid. 

The source function S(t) along a characteristic is in- 
terpolated by linear or parabolic polynomials so that 
Ah = oaSi-! + PiSi + jiS i+1 (12) 
with 

a-i = e 0i + [e 2 i - (An + 2AT l _i)e H ] 

/[An-^An + An-!)} (13) 

Pi = [{An + An-^eu - eayiAn-iAn] (14) 

7i = [ea- An^eul/iAniAn + An-i)] (15) 
for parabolic interpolation and 

ai = e 0i - eu/An-i (16) 

fa = eu/An-i (17) 

7< = (18) 
for linear interpolation. The auxiliary functions are given 

by 

e 0i = 1 - exp(-Ar,_i) (19) 
eu = An~i - e 0i (20) 
e 2l = (Ar J _ 1 ) 2 -2ei» (21) 
and Ari = n+x — n is the optical depth along the charac- 
teristic from point i to point i + 1. The linear coefficients 
have to be used (at least) at the last integration point 
along each characteristic, and for some cases it might be 
better to also use linear interpolation for some inner points 
so as to ensure stability. 

The integration over solid angle can be done in the 
static case using a simple Simpson or trapezoidal quadra- 
ture formula. However, in the case of Lagrangian frame 
radiation transport, the angles (0, </>) vary along a (curved) 
characteristic. Therefore, the (9, 0) grid changes for each 
voxel and developing a quadrature formula in advance re- 
quires a pass through all voxels, storing all (9, </>) points 
for each of them. For larger grids this will amount to sub- 
stantial long term memory requirements as the resulting 
quadrature weights will have to be stored for each (9, <fi) 
pair at all voxels. To avoid this, we have implemented a 
simple Monte-Carlo (MC) scheme to perform the integra- 
tion over solid angle. In the MC integration, the integral 



J = 



1 

-in 



2ji 



7sin( 



J 



/sin 9 



is replaced by a simple MC sum of the form 

1 

2V 2 ^ 

where the sum goes over all solid angle points (8, <f>). The 
{9, (f>) are randomly selected and given equal weight in the 
MC sums. This also works for precribed (8, <fr) grids as long 
as the number of (9, (j)) points is sufficiently large. The ac- 
curacy is improved by maintaining the normalization nu- 
merically for a unity valued test function. The MC method 



has the advantage that the solid angle points can vary 
from voxel to voxel (important for configurations with ve- 
locity fields where the transfer equation is solved in the 
locally co-moving frame). In the static case, the accu- 
racy of the MC method is only insignificantly worse than 
that of the deterministic quadrature, which indicates that 
the MC integration will be very useful in the case of 3D 
radiation transport in moving media. 

2.4.1. Computation of A* 

As demonstrat ed by lOlson et alJ 1)19871) and 

lOlson fc Kunasd l|l987|h the coefficients a, P, and 7 
can be used to construct diagonal and tri-diagonal 
A* operators for ID radiation transport problems. In 
fact, up to the full A matrix can be constructed by a 
straightforward extension o f the idea I Hauschildt et all 
Il994t IHauschildt fc Baronl l2004|h These non-local A* 
operators not only lead to excellent convergence rates 
but they also avoid the problem of false convergence 
that is inherent in the A iteration method and can also 
be an issue for diagonal (purely local) A* operators. 
Therefore, it is highly desirable to implement a non-local 
A* also for the 3D case. The tri-diagonal operator in the 
ID case is simply a nearest neighbor A* that considers 
the interaction of a point with its two direct neighbors. 
In the 3D case, the nearest neighbor A* considers the 
interaction of a voxel with the (up to) 3 3 — 1 = 26 
surrounding voxels (this definition considers a somewhat 
larger range of voxels that a strictly face-centered view of 
just 6 nearest neighbors). This means that the non-local 
A* requires the storage of 27 (26 surrounding voxels plus 
local, i.e., diagonal effects) times the total number of 
voxels A* elements. This can be reduced, for example if 
one considers only the voxels that share a face to a total 
of 7 elements for each voxel. However, this would ignore 
the effect of characteristics that pass 'diagonally' through 
a voxel and would therefore lead to a slower convergence 
rate. 

The construction of th e A* operator pro ceeds in the 
same way as discussed in lHa.uschildd ill 992^ . In the 3D 
case, the 'previous' and 'next' voxels along each char- 
acteristic must be known so that the contributions can 
be attributed to the correct voxel. Therefore, we use a 
data structure that attaches to each voxel its effects on its 
neighbors. The scheme can be extended trivially to include 
longer range interactions for better convergence rates (in 
particular on larger voxel grids), however the memory re- 
quirements to simply store A* ultimately scales like n 3 
where n is the total number of voxels. The storage re- 
quirements can be reduced by, e.g., using A*'s of different 
widths for different voxels. Storage requirements are not 
so much a problem if a domain decomposition paralleliza- 
tion method is used and enough processors are available. 
Below we will show some results for test cases with larger 
operators. 
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We describe here the general procedure of calcu- 
lating the A* with arbitrary bandwidth, up to the 
full A-operator, for the m ethod in spherical symme- 
try ijHauschildt et all Il994h. The c onstru ction of the 
A* is described in lOlson fc Kunasd l)l987h . so that we 
here summarize t he rel evant formulae. In the method of 
lOlson fc Kunasd l|l987h . the elements of the row of A* 
are computed by setting the incident intensities (bound- 
ary conditions) to zero and setting S(i x , i y , i z ) = 1 for one 
voxel (i x ,iy,i z ) and performing a formal solution analyt- 
ically. 

We describe the construction of A* using the 
example of a single characteristic. The contri- 
butions to the A* at a voxel j are given by 



A*. 







A j-ij = 7j-i 



for % < j - 1 (22) 

for i = j - 1 (23) 

Aj',i = Aj-ij exp(— Atj-i) + /3 1 - for i = j (24) 

Aj-|_i j = Ajj exp(— Atj) + otj+i for i = j + 1 (25) 

Ajj = Ai_ij cxp(— Ar;_i) for j + 1 < i (26) 

These contributions are computed along a characteristic, 
here i labels the voxels along the characteristic under 
consideration. These contributions are integrated over 
solid angle with the same method (either deterministic 
or through the Monte-Carlo integration) that is used for 
the computation of the J. For a nearest neighbor A*, the 
process of Eq.^Hlis stopped with i = j + 1, otherwise it is 
continued until the required bandwidth has been reached 
(or the characteristic has reached an outermost voxel and 
terminates) . 

2.5. Operator splitting step 

For a diagonal A* the solution of Eq. [S]is just a division 
by [1 — A*(l — e)] which requires very little effort. For the 
non-local A* operator, a matrix equation has to be solved 
at each step in order to compute J ne w For the nearest 
neighbor operator the matrix structure is that of a sparse 
band matrix where the bandwidth of the matrix is pro- 
portional to the square of the maximum number of points 
along a coordinate and there are a total of 27 non-zero 
entries in every column of the matrix. This system can 
be solved by any suitable method. For e xample, we have 
imple mented a solver based on LAPACK ( Anderson et al.l 
119921) routines. Although this solver works fine for small 
grids [(2 * 16 + l) 3 voxels with a bandwidth of 1123], the 
memory requirements rise beyond available single-CPU 
limits with already only (2 * 32 + l) 3 voxels with a band- 
width of 4291. For a more realistic grid of (2 * 256 + l) 3 
voxels the bandwidth is more than 250,000. Therefore, a 
standard band matrix solver is only useful for compari- 
son and testing on very small grids. Other s parse matrix 
solvers (for example. IXiaove fc Demmell 2003') have similar 
memory scaling properties. 



As an alternative to direct solutions of Eq. we have 
implemented iterative solvers. These solvers have the huge 
advantage that their memory requirements are minimal to 
modest. In addition, they can be tailored to the special for- 
mat of the matrix in Eq. El which makes coding the meth- 
ods very simple. We obtained extremely good results with 
the Jordan and the Gauss-Seidel methods, which converge 
rapidly to a relative accuracy of 10 -10 . For either method, 
Ng acceleration turned out to be very useful, reducing the 
number of iterations significant ly. In addition, the Rapido 
method (Zurmiihl & Falk 1986) which was constructed to 
solve systems of the form 

(1 - M)x = a (27) 
can be used. However, Rapido has strong constraints on 
the spectral radius of M and this cannot be used for prob- 
lems that involve substantial scattering. In the test calcu- 
lations discussed below we have used the Jordan or Gauss- 
Seidel solvers as they are the fastest of the solvers we have 
implemented. We verified that all solvers give the same re- 
sults. 



3. Application examples 

As a first step we have implemented the method as a MPI 
parallelized Fortran 95 program. The parallclization of the 
formal solution is presently implemented over solid angle 
space as this is the simplest parallelization option and also 
one of the most efficient (a domain decomposition paral- 
lelization method will be discussed in a subsequent pa- 
per). In addition, the Jordan solver of the Operator split- 
ting equations is parallelized with MPI (see below for scal- 
ing properties of the MPI implementation). The number 
of parallelization related statements in the code is small, 
about 320 out of a total of about 7900. 

Our basic continuum sca ttering test proble m is sim- 
ilar to that discussed in lHausc hildt (1992) and in 
lHauschildt fc Baronl l)2004|) . This test problem covers a 
large dynamic range of about 9 dex in the opacities and 
overall optical depth steps along the characteristics and, in 
our experience, constitutes a reasonably challenging setup 
for the radiative transfer code. The application of the 3D 
code to 'real' problems is in preparation and requires a 
substantial amount of development work (in progress). For 
the ID code we have found that the test case is actually 
pretty much a worst case scenario and that it generally 
works better in real world problems. We use a sphere 
with a grey continuum opacity parameterized by a power 
law in the continuum optical depth r st d ■ The basic model 
parameters are 

1. Inner radius r c = 10 13 cm, outer radius r out = 1.01 x 
10 15 cm. 

2. Minimum optical depth in the continuum t^™ = 10~ 4 
and maximum optical depth in the continuum t™^ x = 

3. Grey temperature structure with T g = 10 4 K. 
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4. Outer boundary condition 1^ = and diffusion inner 
boundary condition for all wavelengths. 

5. Continuum extinction Xc = C /r 2 , with the constant C 
fixed by the radius and optical depth grids. 

6. Parameterized coherent & isotropic continuum scat- 
tering by defining 

Xc = £ c k c + (1 - (c)<Jc (28) 

with < e c < 1. k c and o~ c are the continuum absorp- 
tion and scattering coefficients. 

The test model is just an optically thick sphere put into 
the 3D grid. This problem is used because the results can 
be directly compared with the results obtained with o ur 
ID spherical radiation transport code l(Hauschildtl 1992h to 
assess the accuracy of the method. The sphere is centered 
at the center of the Cartesian grid, which is in each axis 
10% larger than the radius of the sphere. For the test 
calculations we use voxel grids with the same number of 
spatial points in each direction (see below) . The solid angle 
space was discretized in (6, (ft) with ng — if not stated 
otherwise. In the following we discuss the results of various 
tests. In all tests we use the LC method for the 3D RT 
solution unless stated otherwise. 

3.1. LTE tests 

In this test we have set e = 1 to test the accuracy of 
the formal solution by comparing to the results of the ID 
code. The ID solver uses 50 radial points, distributed log- 
arithmically in optical depth. For the 3D solver we tested 
'small' grids with n x = n y = n z = 2 * 32 + 1 points along 
each axis, for a total of 65 3 f=s 2.7 x 10 5 voxels, as well 
as 'medium' (n x = n y = n z = 2 * 64 + 1 with a total 
of 129 3 « 2.1 X 10 6 voxels) and 'large (n x = n y = n z = 
2 * 96 + 1 with a total of 193 3 » 7.2 x 10 6 voxels). The 
large grid was limited by available memory for the stor- 
age of the non-local A* operator. The solid angle space 
discretization uses, in general, ng — — 64 points. In 
Fig. [5] we show the mean intensities as function of dis- 
tance from the center for both the ID (+ symbols) and 
the 3D solver. The results show excellent agreement be- 
tween the two solutions, thus the 3D RT formal solution is 
comparable in accuracy with the ID formal solution. We 
demonstrate below that for the conditions used in these 
tests a larger number of solid angle points significantly 
improves the accuracy of the mean intensities. 

3.2. Tests with continuum scattering 
3.2.1. e = 10~ 4 

For this test, we use the same basic structure setup as for 
the LTE test, however we now use e = 10~ 4 for a scat- 
tering dominated atmosphere. The comparison with the 
results obtained with the ID RT code (with 100 radial 
points) are compared to the 3D RT code results in Fig. 01 



Note the factor of about 1000 difference between the so- 
lution for e = 10~ 4 and the results of the formal solution 
with S = B shown in Fig. [21 at the largest distances (the 
iterations were started with S = B). Figure shows the 
results for the 'large' and 'medium' spatial grids and for 
2 different solid angle discretizations (64 2 and 16 2 solid 
angle points). For both the ID and the 3D code the mean 
intensities were iterated to a relative accuracy of 10~ 8 
(see below for details on convergence rates). The graph 
highlights the need for a rather fine solid angle grid for 
the test problem. With a small number of angular points 
(bottom panel in Fig.OJ the numerically induced spread of 
the mean intensities is significantly larger than with a 4 2 
finer angular grid (shown in the middle panel). A change 
in spatial grid resolution has a much smaller effect on the 
results than the number of solid angle points as shown in 
the top two panels of Fig. The spatial resolution of the 
'small' grid is, however, too coarse to represent the test 
problem, in particular in the inner parts of the structure. 

The importance of the angular resolution is also 
demonstrated in Fig. 01 which shows the contours of the 
mean intensity on the six 'faces' of the voxel cube for 129 3 
spatial points and 16 2 (left panel) and 64 2 (right panel) 
solid angles. The calculation with 64 2 solid angle points 
shows symmetric contours whereas the smaller 16 2 angle 
points model shows asymmetries and 'banding' like struc- 
tures on all faces (in particular on the x — y faces). This 
clearly indicates that for the test conditions used the an- 
gular resolution has to be larger than 16 2 for accurate 
solutions. This can also be seen in surface plots of the 
mean intensities at the — n x faces of the z — y planes for 
both calculations, cf. Fig. [5J The surface calculated with 
64 2 angles is much smoother and shows no or little band- 
ing compared to the surface for 16 2 angular points. The 
effects of higher spatial resolution on the J surface at the 
z — y face is shown in Fig. EJ Clearly, 64 2 angles produce 
a smooth surface without significant artifacts for the test 
case. 

The convergence properties of the various methods for 
this test case are shown in Fig. As expected, the A 
iteration converges very slowly and requires more than 
300 iterations to reach a relative error of less than 10~ 8 . 
The diagonal A* operator converges significantly better 
than the A iteration but is still rather slowly convergent. 
The nearest neighbor A* converges substantially faster (by 
more than a factor of 3 than the diagonal A*). The diago- 
nal and nearest nei ghbor iter ations can be accelerated by 
using Ng's method lNelll974h . as shown in Fig. In both 
4th order Ng acceleration was used after 20 initial 
iterations, starting Ng acceleration too early can lead to 
convergence failures since Ng acceleration is based on ex- 
trapolation. An attempt to apply the Ng acceleration to 
the A iteration was not successful, similar to the ID case, 
as the convergence rate of the A iteration appears to be 
too small to be useful for the Ng method. For compari- 
son, we show in Fig. the convergence properties of the 
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ID RT solver with a tri-diagonal A* and Ng acceleration 
(here started much earlier). The convergence rates of the 
ID tri-diagonal and 3D nearest neighbor methods are very 
comparable. The convergence properties are also relevant 
for the overall speed of the method: whereas the time for 
the formal solution (for a given number of voxels and pro- 
cessors) depends roughly linearly on the total number of 
solid angle points, the time for the solution of the opera- 
tor splitting step does not depend on the number of angle 
points and actually decreases with iteration number if the 
Jordan, Gauss-Seidel or Rapido solvers are used. There- 
fore, the nearest neighbor operator will become more and 
more efficient as the number of angles and/or voxels in- 
creases. 

The smaller initial corrections of the A iteration are 
actually an indication that it just corrects too little, the 
operator splitting method gives initially much larger cor- 
rections. This is exactly like similar test cases in ID that 
we have tried. The initial corrections are so large because 
the initial guesses are very wrong (intentionally) so that 
between initial guess and final result we have changes by 
close to 10 orders of magnitude. This means that the ini- 
tial corrections of the operator splitting method have to 
be large. 

The convergence rates will depend on the grid resolu- 
tion as the optical depths between voxels will be smaller 
for larger resolutions and thus the coupling between voxels 
will be stronger. This effect is shown in Fig. 00 where we 
show the convergence rates for various grids. The conver- 
gence rates are independent of the number of solid angle 
points but depend weakly on the number of grid points, 
as expected. Thus, for voxel grids with higher resolution, 
a larger A* (more neighbors) will be more useful than for 
coarser voxel grids. 

Figure [5] shows the convergence rates for the e = 1CP 4 
test case for different A* bandwidths. The wider A*'s lead 
to improved convergence rates; however, the net wallclock 
time increases for the wider A*'s in the parallel code. This 
is caused by the substantially larger amount of informa- 
tion that needs to be passed between processes in order 
to build the wider A*. In addition, the memory require- 
ments for a given number of voxels scales like the cube of 
the number of neighbors considered in the A*. As Fig. 
demonstrates, the convergence is strongly improved by the 
use of Ng acceleration, however, for larger A*'s the Ng 
method can be detrimental for wider A* compared to nar- 
rower A*. 

3.2.2. e = KT 8 

The final test we present in this paper considers a case 
with a much larger scattering contribution: e = 10~ 8 . The 
results for a grid with 129 3 voxels and 64 2 angular points 
are shown in Fig. ^] Fig. ^2 shows the results for slices 
along the coordinate axes (the two coordinates being cen- 
tered, respectively). Even though the dynamic range of 



the mean intensities is huge, nearly 12 dex, the results are 
quite accurate. For this case the lack of spatial resolution 
in the inner parts of the voxel grid can be seen. Here |VJ 
is huge and cannot be fully resolved by the 3D RT code 
(the ID code naturally has much higher resolution) . How- 
ever, only a few voxels away from the center the results 
agree very well. 

The convergence plots in Fig El show the results for 
a very difficult test case with t^ x — 10 8 for a fixed 
voxel and solid angle grid with 65 3 voxels and 16 2 angular 
points. For this test, the A iteration fails completely. The 
diagonal operator provides significant speed-up, but still 
requires more than 1600 iterations to reach the required 
convergence limit. For this test, the Ng acceleration does 
not work with the diagonal operator, the iteration pro- 
cess failed immediately after it was started. It is likely 
that a better result could be obtained if Ng acceleration is 
started in the steep part of the diagonal operator's conver- 
gence, e.g., after about 500 regular iterations. The nearest 
neighbor operator leads to much faster convergence, even 
without Ng acceleration the solution converges in about 
450 iterations. Here, Ng acceleration works very well with 
the nearest neighbor operator, convergence is reached after 
177 iterations. This is still about a factor of 2 more than 
for the ID code, but much better than with the diagonal 
operator. The variation of the convergence rate for the 
nearest neighbor operator and Ng acceleration with the 
size of the solid angle grid is shown in Fig. ^5] The case 
with the smallest angular grid (16 2 points) actually con- 
verges more slowly than the 32 2 and 64 2 grids. The higher 
resolution grids convergence rate compares well with the 
ID code. This highlights the importance of the non-local 
A* operator and a large enough solid angle grid for rapid 
convergence and accuracy. 

3.2.3. Visualization 

We have implemented a simple visualization of the results 
in order to view images of the emitted intensities. The 
visualization uses IDL to read the results of the 3D RT 
code, performs a formal solution for a specific (9, 4>) and 
displays the result as an image of the intensities leaving 
the voxel grid. These figures are extremely helpful for 
discovering even small problems with the generation and 
handling of the characteristics or issues due to low reso- 
lution in solid angle or space. Such problems will imme- 
diately show up as asymmetries in the generated images. 
We show the results for the LTE test (with 65 3 voxels) in 
Fig. The limb darkening of the test sphere is clearly 
visible in the figures. The slight pixellation and asymme- 
tries are due to spatial and angular resolution. Note that 
the scales of the different panels are different due to the 
changing orientation of the data cube. The generated im- 
ages for the e = 10 -4 test case with 129 3 and 193 3 voxels 
are shown in Figs. 1151 and 1161 respectively. For both fig- 
ures, the source functions were calculated with 64 2 angles. 
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The images show much less pixellation and far less arti- 
facts than the images shown in Fig.^J The results for the 
e = 10~ 8 test case are shown in Fig. El they look similar 
to the e = 10 -4 case with the same grid sizes. 

3.2.4. MPI scaling properties 

Figure IT8l shows the scaling properties of the MPI version 
of the 3D radiation transport code for the e = 10 -8 test 
case. The runs were performed on 2 parallel compute clus- 
ters, one equipped with 1.8GHz dual Opteron CPUs and 
an Infiniband interconnect from Delta computer and one 
equipped with 2.0GHz dual G5 CPUs with Gbit ethernet 
network from Apple computer (Xserves). The speedup we 
obtain in the MPI version is close to optimal, about a fac- 
tor of 28 with 32 MPI processes on 32 CPUs (or 16 com- 
pute nodes). The fact that the speedup is very good shows 
that the load balancing is optimal and that the time spent 
in the MPI communication routines is negligible compared 
to the compute times. 

With 129 3 voxels, the code uses about 0.6GB of mem- 
ory. With 10 CPUs and 64 2 angles, the wallclock time 
for a formal solution is about 310 sec (400 sec) on 2.0GHz 
Xserve G5s (on 1.8GHz Opterons), 9 sec (9 sec) for the 
required MPI communication, and between 3-26 sec (12- 
120 sec) to solve the linear system. Since the linear system 
is solved iteratively, the time for the solution is reduces as 
the overall convergence limit is approached. 

4. Conclusions 

We have described a framework for solving three- 
dimensional radiative transfer problems in scattering dom- 
inated environments. The method uses a non-local oper- 
ator splitting technique to solve the scattering problem. 
The formal solution is based on a long characteristic piece- 
wise parabolic procedure. For strongly scattering domi- 
nated test cases (sphere in a box) we find good conver- 
gence with non-local A* operators, as well as minimal nu- 
merical diffusion with the long characteristics method and 
adequate resolution. A simple MPI parallelization gives 
excellent speedups on parallel clusters. In subsequent work 
we will implement a domain decomposition method to al- 
low much larger spatial grids. Presently, we have imple- 
mented the method for static media, it can be used with- 
out significant changes to solve problems in the Eulerian- 
frame for media with low velocity fields. The distribution 
of matter over the voxels is, in the general 3D case, arbi- 
trary. We chose a spherical test case to be able to compare 
the results our ID code. 

In Figs. EH an d GO w e compare the results for 
the e = 10~ 4 test case using a simple implemen- 
tation of the short characteristics method and the 
long characteristics method used in this paper. The 
test grid contains 129 3 voxel and 64 2 angular points. 
The high diffusivity of the SC method is evident. 
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SC methods in multi-dimensional radiative transport 
problems. Short characteristics techniques are faster but 
requi re special co nsiderations to reduce numerical diffu- 
sion (|Auerl l2003h . We may further look into the SC 
method in later papers. 

We have generalized the operator splitting to include 
larger bandwidth operators. They lead to faster conver- 
gence although they do require more memory and ulti- 
mately more computing time. Nevertheless, they will be 
useful for highly complex problems and we have developed 
a highly flexible approach to the construction of the A* 
operator so that the bandwidth may be set for each spa- 
tial point individually as the problem and computational 
resources require. 

We have designed an especially general and flexible 
framework for 3D radiative transfer problems with scat- 
tering. In future papers of this series we will describe its 
extension to line transfer problems, multi-level NLTE cal- 
culations, and differentially moving flows. 
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Fig. 2. Comparison of the results obtained for the LTE 
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x axis shows the distances from the center of the sphere, 
the y axis the log of the mean intensity. 
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symbols) and the 3D solver. The x axis shows the dis- 
tances from the center of the sphere, the y axis the log of 
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(spatial; solid angle) grid with (193 3 ; 64 2 ) points, the mid- 
dle panel for (129 3 ;64 2 ) points and the bottom panel for 
(129 3 ; 16 2 ) points. 

Hauschildt, P. H., Storzer, H., & Baron, E. 1994, JQSRT, 
51, 875 

Hummel, W. 1994a, Astrophys. Space. Sci., 216, 87 
Hummel, W. 1994b, A&A, 289, 458 
Hummel, W. & Dachs, J. 1992, A&A, 262, L17 
Mihalas, D. 1978, Stellar Atmospheres (New York: W. H. 

Freeman) 
Mihalas, D. 1980, ApJ, 237, 574 

Mihalas, D., Kunasz, P., & Hummer, D. 1975, ApJ, 202, 
465 

Ng, K. C. 1974, J. Chem. Phys., 61, 2680 

Norman, M. L. 2000, in Revista Mexicana de Astronomia 

y Astrofisica Conference Series, Vol. 9, 66-71 
Olson, G. L., Auer, L. H., & Buchler, J. R. 1987, JQSRT, 

38, 431 

Olson, G. L. & Kunasz, P. B. 1987, JQSRT, 38, 325 
Papkalla, R. 1995, A&A, 295, 551 

Rijkhorst, E.-J., Plewa, T., Dubey, A., & Mellema, G. 

2005, A&A, in press, astro-ph/0505213 
Steiner, O. 1991, A&A, 242, 290 

Trujillo Bueno, J. & Fabiani Bendicho, P. 1995, ApJ, 455, 
646 

Turek, S. 1993, preprint 

van Noort, M., Hubeny, I., & Lanz, T. 2002, ApJ, 568, 
1066 

Vath, H. M. 1994, A&A, 284, 319 

Xiaoye, S. L. & Demmel, J. W. 2003, ACM Transaction 

on Mathematical Software, 29, 110 
Zurmiihl, R. & Falk, S. 1986, Matrizen und ihre Anwen- 

dungen, 5th edn., Vol. 2 (Berlin: Springer- Verlag) 



Fig. 5. Mean intensity surfaces at the z ~ y face for the 
test case with e = 10~ 4 , 129 3 spatial points, and 16 2 (left 
panel) and 64 2 solid angle points. The axes are labeled by 
voxel index with (x, y, z) = (0, 0, 0) being the center of the 
voxel grid. Each plot shows one outside face of the voxel 
cube (the physical scales are the same in all directions). 

Fig. 6. Mean intensity surface at the z — y face for the 
test case with e = 10~ 4 with 193 3 spatial points and 64 2 
solid angle points. The axes are labeled by voxel index 
with (x, y, z) = (0, 0, 0) being the center of the voxel grid. 

Fig. 7. Convergence properties of the codes for the e = 
10~ 4 test case. The labels indicate the different methods 
used. The 3D test runs use 65 3 spatial and 16 2 angular 
points. 

Fig. 8. Convergence properties of the codes for the e = 
10~ 4 test case and different grid sizes. The labels indicate 
the different grid sizes used, all but the A iteration use the 
nearest neighbor operator with Ng acceleration. 

Fig. 9. Convergence properties of the e = 10~ 4 test case 
for various A* operator bandwidth choices with and with- 
out Ng acceleration. 

Fig. 10. Comparison of the results obtained for the scat- 
tering dominated (e = 10~ 8 ) test with the ID solver (+ 
symbols) and the 3D solver. The x axis shows the dis- 
tances from the center of the sphere, the y axis the log 
of the mean intensity. The graph shows the results for a 
(spatial; solid angle) grid with (129 3 ; 64 2 ) points. Note the 
large dynamic range (12 dex) of the mean intensities. 

Fig. 11. Comparison of the results obtained for the scat- 
tering dominated (e = 10~ 8 ) test with the ID solver (+ 
symbols) and the 3D solver for slices along the x, y, and z 
axes. The plot shows the results for a (spatial; solid angle) 
grid with (129 3 ; 64 2 ) points. Note the large dynamic range 
(12 dex) of the mean intensities. 

Fig. 12. Convergence properties of the codes for the 
e = 10~ 8 test case. The labels indicate the different meth- 
ods used. These test runs use 65 3 spatial and 16 2 angular 
points. 

Fig. 13. Convergence properties of the codes for the e = 
10 -8 test case and different angular grid sizes. The labels 
indicate the different grid sizes used, all but the A iteration 
use the nearest neighbor operator with Ng acceleration. 
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Fig. 4. Mean intensity contour plots for the test case with e = 1CP 4 with 129 3 spatial points and 16 2 (left panel) and 
64 2 solid angle points. The axes are labeled by voxel index with (x, y, z) — (0, 0, 0) being the center of the voxel grid. 
Each plot shows one outside face of the voxel cube (the physical scales are the same in all directions). 



Fig. 14. Visualization of the results for the LTE case. The 
voxel grid has 65 3 elements. The intensity image is shown 
for (8, 4>) = (0, 0) (upper left panel), (45, 45) (bottom right 
panel), (140,250) (upper right panel), and (89, 139) (bot- 
tom right panel) degrees. The intensities are mapped lin- 
early to 255 shades of gray. The direction of the axes are 
given with axes lengths corresponding to a total of 50 vox- 
els. The borders of the front faces of the voxel cube are 
shown, their widths corresponds to the apparent width 
of a voxel. Note that the different panels are at different 
scales. 

Fig. 15. Visualization of the results for the e = 10~ 4 
case with a 129 3 elements voxel grid. The intensity image 
is shown for (6*, (f>) = (0,0) (upper left panel), (45,45) 
(bottom right panel), (140,250) (upper right panel), and 
(89, 139) (bottom right panel) degrees. The intensities are 
mapped linearly to 255 shades of gray. The direction of the 
axes are given with axes lengths corresponding to a total 
of 100 voxels. The borders of the front faces of the voxel 
cube are shown, their widths corresponds to the apparent 
width of a voxel. Note that the different panels are at 
different scales. 

Fig. 16. Visualization of the results for the e = 10~ 4 case 
with a 193 3 voxel grid. The intensity image is shown for 
{6,4>) = (0,0) (upper left panel), (45,45) (bottom right 
panel), (140,250) (upper right panel), and (89, 139) (bot- 
tom right panel) degrees. The intensities are mapped lin- 
early to 255 shades of gray. The direction of the axes are 
given with axes lengths corresponding to a total of 100 
voxels. The borders of the front faces of the voxel cube 
are shown, their widths corresponds to the apparent width 
of a voxel. Note that the different panels are at different 
scales. 

Fig. 17. Visualization of the results for the e = 10 -8 
case with a 129 3 elements voxel grid. The intensity image 
is shown for (6,<j)) — (0,0) (upper left panel), (45,45) 
(bottom right panel), (140,250) (upper right panel), and 
(89, 139) (bottom right panel) degrees. The intensities are 
mapped linearly to 255 shades of gray. The direction of the 
axes are given with axes lengths corresponding to a total 
of 100 voxels. The borders of the front faces of the voxel 
cube are shown, their widths corresponds to the apparent 
width of a voxel. Note that the different panels are at 
different scales. 
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Fig. 18. Scaling properties of the MPI version of the 3D 
RT code for parallel clusters based on Opterons and G5 
CPUs. In absolute scales the G5s are about 30% faster 
than the Opterons. 

Fig. 20. Comparison of the mean intensity surfaces at 
the z — y face for the test case with e = 10~ 4 with 129 3 
spatial points and 64 2 solid angle points. The left panel 
shows the results obtained with the short characteristics 
methods whereas the right panel shows the results of the 
long characteristics method. The axes are labeled by voxel 
index with (x, y, z) — (0, 0, 0) being the center of the voxel 
grid. Each plot shows one outside face of the voxel cube 
(the physical scales are the same in all directions). 
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Fig. 19. Comparison of the mean intensity contour plots for the test case with e = 1CT 4 with 129 3 spatial points and 
64 2 solid angle points. The left panel shows the results obtained with the short characteristics method whereas the right 
panel shows the results of the long characteristics method. The axes are labeled by voxel index with (x, y, z) — (0, 0, 0) 
being the center of the voxel grid. Each plot shows one outside face of the voxel cube (the physical scales are the same 
an all directions). 



